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A FAST SEARCH ALGORITHM FOR (m, m, m) TRIPLE PRODUCT 
PROPERTY TRIPLES AND AN APPLICATION FOR 5x5 MATRIX 

MULTIPLICATION 
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Abstract. We present a new fast search algorithm for (m, m, m) Triple Product Property 
(TPP) triples as defined by Cohn and Umans in 2003. The new algorithm achieves a speed-up 
^ ' factor of 40 up to 194 in comparison to the best known search algorithm. With a parallelized 

, version of the new algorithm we are able to search for TPP triples in groups up to order 55. 

As an application we identify a list of groups that would realize 5x5 matrix multiplication 
with under 100 resp. 125 scalar multiplications (the best known upper bound by Makarov 1987 
resp. the trivial upper bound) if they contain a (5, 5, 5) TPP triple. With our new algorithm we 
show that no group can realize 5x5 matrix multiplication better than Makarov's algorithm. 
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1. Introduction 



1.1. A Very Short History of Fast Matrix Multiplication. The naive algorithm for matrix 

■ multiplication is an 0{n^) algorithm. From Strassen [15j we know that there is an 0(n^'^^) 
I algorithm for this problem. One of the most famous results is an 0(n^'^^^^) algorithm from 
' Coppersmith and Winograd [1]. Recently, Williams [16] found an algorithm with 0{n'^'^'^'^'^) 
. run-time based on the work of Stothers [H]. Let M{n) denote the number of field operations 

ly^ \ in characteristic required to multiply two (n x n) matrices. Then we call uj := inf{r € M : 

■ M{n) = 0{n^)} the exponent of matrix multiplication. Details about the complexity of matrix 
' multiplication and the exponent uj can be found in [1]. 



1.2. A Very Short History of Small Matrix Multiplication. The naive algorithm uses 
multiplications and — additions to compute the product of two nxn matrices. The famous 



^ , result 0{n ) is based on an algorithm that can compute the product of two 2x2 matrices 



with only 7 multiplications. Winograd [17J proved that the minimum number of multiplications 
required in this case is 7. The exact number R{n) of required multiplications to compute the 
product of two nxn matrices is not known for n > 2. There are known upper bounds for 
some cases. Table [T] lists the known upper bounds for R{n) up to n = 5. Tables for up to 
n = 30 can be found in [5_, Section 4]. Hedtke and Murthy proved in [9l Theorem 7.3] that the 
group-theoretic framework (discussed in Subsection II. 4p is not able to produce better bounds 
for R{3) and i?(4). 
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n X n 


upper bound for R{n) 


algorithm 


2x2 


7 


Strassen [15] 


3x3 


23 


Laderman |10j 


4x4 


49 


Strassen [15] 


5x5 


100 


Makarov [U] 



Table 1. Upper bounds for R{2), R{3), R{A) and i?(5). 



1.3. Bilinear Complexity. Later we will use the concept of bilinear complexity to connect 
group-theoretic arguments with the complexity of matrix multiplication. 

Definition 1.1 (Rank), [jj Chapter 14 and Definition 14.7] Let A; be a field and U, V, W finite 
dimensional fc- vector spaces. Let rj : U x V W he a. fc-bilinear map. For i £ {1, ■ ■ ■ ,r} let 

fi £ U* , gi £ V* (dual spaces of U and V resp. over k) and Wi G W such that 

r 

V(.u,v) = ^fi{u)gi{v)wi 
1=1 

for all n G [7 and v V. Then (fi, gi,wi; . . . ; fr, g^Wr) is called a k-bilinear algorithm of 
length r for r], or simply a bilinear algorithm when k is fixed. The minimal length of all bilinear 
algorithms for r] is called the rank R{r}) of rj. Let A be a /c-algebra. The rank R{A) of A is 
defined as the rank of its bilinear multiplication map. 

Definition 1.2 (Restriction of a bilinear map). [Tl, Definition 14.27] Let (j): U x V ^ W and 
(j)' : U' X V' ^ W be A;-bilinear maps. A k-restriction, or simply a restriction (when k is fixed), 
of (j)' to i;^ is a triple (cr, r, (') of linear maps a: U ^ U' , t : V ^ V and (' : W — > W such that 
(f) = C o <p' o (a X t): 



U X V—-^W 



© 



U' X v — ^ w 

We write </> < 0' if there exists a restriction of (j)' to 

1.4. The Group-Theoretic Approach of Cohn and Umans. In 2003 Cohn and Umans 
introduced in [3j a group-theoretic approach to fast matrix multiplication. The main idea of 
their framework is to embed the matrix multiplication over a ring R into the group ring R[G] of 
a group G. A group G admits such an embedding if there are subsets S, T and U oi G which 
satisfy the so-called Triple Product Property. 

Definition 1.3 (right quotient). Let G be a group and AT be a nonempty subset of G. The 
right quotient Q{X) of X is defined by Q{X) := {xy~^ : x,y € X}. 

Definition 1.4 (Triple Product Property). We say that the nonempty subsets S, T and U of 
a group G satisfy the Triple Product Property (TPP) if for s € Q{S), t G Q{T) and u G Q{U), 
stu = 1 holds if and only ii s = t = u = 1. 

Let A; be a field. By {n,p, m)^ we denote the bilinear map k"'^'P x k^^"^ — )• A:"^"^, {A, B) ^ AB 
describing the multiplication oi n x p hy p x m matrices over k. When k is fixed, we simply 
write {n,p,m). Unless otherwise stated we will only work over k = C in the entire paper. We 
say that a group G realizes {n,p,m) if there are subsets S,T,U C G of sizes \S\ = n, |r| = p 
and \U\ = m, which satisfy the TPP. In this case we call {S,T, U) a TPP triple of G, and we 
define its size to be npm. 
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Definition 1.5 (TPP capacity). We define the TPP capacity 13(G) of a group G as /3(G) := 
ma.x{npm : G realizes {n,p,m)}. 

Let us now focus on the embedding of the matrix multipUcation into C[G]. Let G reahze 
{n,p,m) through the subsets 5, T and U. Let A be an n x p and B he a p x m matrix. We 
index the entries of A and B with the elements of S, T and U instead of numbers. Now we have 



Cohn and Umans showed that this is the same as the coefficient of s^^n in the product 

y A,ts-H\(Y. Bfj-\]. (1) 

So we can read off the matrix product from the group ring product by looking at the coefficients 
of s^^u with s € S and u & U. 

Definition 1.6 (r-character capacity). Let G be a group with the character degrees {di}. We 
define the r-character capacity of G as Dr{G) := J2i ^i- 

We write R{n,p,m) for the rank of the bilinear map {n,p,m), and R{n) for R{n,n,n). If 
G realizes {n,p,m) then {n,p,m) < C[G] (see [Sj Theorem 2.3]) by the construction above and 
therefore R{n,p,m) < i?(C[G]) =: R{G): 



„ „ matrix multiplication 



embedding 
Q into C[G] 



© 



{AB)s^u = coeffi- 
cient of s^^u in 1^ 



C[G] X C[G] ^C[G] 

multiplication in C[G] 

From Wedderburn's structure theorem it follows that R{G) < R{di). The exact value of R{G) 
is known only in a few cases. So, usually we will work with the upper bound D3(G) > J2i R{di), 
which follows from the rank of the naive matrix multiplication algorithm for {d, d, d). We can 
now use /3(G) and Dr{G) to get new bounds for w: 

Theorem 1.7. [3, Theorem 4.1] // G / 1 is a finite group, then fi{G)^ < D^{G). 

Finally we collect some results to improve the performance of our algorithms in the next 
sections. 

Lemma 1.8. [Sj Lemma 2.1] Let {S,T,U) be a TPP triple. Then for every permutation vr £ 
Sym{{S,T,U}) the triple {TT{S),iT{T),7r{U)) satisfies the TPP. 

Lemma 1.9. [12, Observation 2.1] Let G be a group. If (S,T,U) is a TPP triple of G, then 
{dSa, dTb, dU c) is a TPP triple for all a, b,c,d& G, too. 

Lemma 11.91 is one of the most useful results about TPP triples. It allows us to restrict the 
search for TPP triples to sets that satisfy 1 G S n T n C/. 

Definition 1.10 (Basic TPP triple). Following Neumann [H], we shall call a TPP triple 
(S, T, U) with 1 € 5 n T n f/ a basic TPP triple. 

For that reason, we will assume throughout that every TPP triple is a basic TPP triple. 

Lemma 1.11. [H Observation 3.1] If {S,T,U) is a TPP triple, then \S\{\T\ + \U\ - 1) < |G|, 
|r|(|S| + \U\ - 1) < |G| and \U\{\S\ + |r| - 1) < [Gj. 

Theorem 1.12. [9j, Theorem 3.1] Three sets Si, S2 and S3 form a TPP triple {Si, S2, S3) if 
and only if for all vr G Sym(3) 

ieSinS2nS3, Qis^,) nQiS^,) = l, and QiS^,)nQiS^,)QiS^,) = i. 
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1.5. The Aim of this Paper. The second and fourth authors of this paper created what we 
beheve are currently the most efficient search algorithms for TPP triples [9]. They also showed 
that the presented group-theoretic framework is not able to give us new and better algorithms 
for the multiplication of 3 x 3 and 4x4 matrices over the complex numbers. 

To attack the 5x5 matrix multiplication problem we develop a new efficient search algorithm 
for {m,m,m) (especially (5,5,5)) TPP triples. For this special case of TPP triples it is faster 
than any other search algorithm and it can easily be parallelized to run on a supercomputer. 

Even with the new algorithm, it is not feasible simply to test all groups of order less than 
100 (best known upper bound for R{5)) for (5,5,5) triples. Therefore we develop theoretical 
methods to reduce the list of candidates that must be checked. We show that the group-theoretic 
framework cannot give us a new upper bound for R{5). 

We will also produce a list of groups that could in theory realize a nontrivial (with less than 
125 scalar multiplications) multiplication algorithm for 5x5 matrices. Additionally we show 
how it could be possible to construct a matrix multiplication algorithm from a given TPP triple. 

2. The Search Algorithm for {m,m,m) TPP Triples 

In this section we describe the basic idea and important implementation details for our new fast 
search algorithm for {m, m, m) triples. The goal of the algorithm is to find possible candidates 
for TPP triples {S,T, U) using the following necessary and sufficient conditions: 

iG5nrnc/ and Q{S)r\Q{T) = Q{S)r\U = Q{T)r\U = i. (2) 

The second condition is a weaker formulation of the known result using Q{U) (in Theorem ll.l2|) . 
but it is more useful in our algorithm. For each TPP candidate that comes from the algorithm 
we test if it satisfies the TPP or not (e.g. with a TPP test from [9l Section 4]). 

Let G be a finite group. Let n := \G\ — 1. Let {go := Icgi, ■ ■ ■ ,gn) be an arbitrary but fixed 
order of the elements of G. We want to find an (m, m, m) TPP triple {S, T, U) (or possible TPP 
triple candidates) of subsets of G. For this, we will represent S, T and U via their basic binary 
representation: 

Definition 2.1 (binary representation). If X is an arbitrary subset of G we write the binary 
representation bx of X as an element of {0, l}''^!, where {bx)e = 1 if and only if g£ ^ X and 
(bx)e = otherwise (0 < ^ < n). 

Because we only consider basic TPP triples, (65)0 = (^t)o = {bu)o = 1, so we only need to 
consider the binary representations for 1 < i < n. We call this the basic binary representation 
6^ and 6^. We define supp(63c) := {i '■ it>*x)i = 1} = : i > 0, gi £ X} as the support of a 
basic binary representation b*j^. For example, if |G| = 8 and S = {1, 52, 54, ff?}, then 

bs = (1,0,1,0,1,0,0,1) 

b*s = (0,1,0,1,0,0,1) 

supp(6^) = {2,4,7}. 

We want to sketch the basic idea behind the algorithms with a matrix representation of the pos- 
sible TPP candidates. This representation is not efficient and will not be used in the algorithms 
itself. It is only used in this subsection to describe the method. Let C E {0, ipx"- denote a 
matrix representation of a possible TPP candidate. Each row of 

'b*c 



C 



's 

b^ 

b] 



is the basic binary representation of S, T, resp. U . We can describe the fundamental idea with 
three steps 

(SI) The "moving 1" principle to find the next possible TPP triple candidate after a TPP 
test for the previous candidates fails. 
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(52) The "marking the quotient" routine to reahze Equation ([2]). 

(53) An efficient way to store the matrix C and access its entries. 



2.1. The "moving 1" principle. The "moving 1" principle is based on two observations and 
an idea: 

Observations. (1) The column sums of C are at most 1. 

(2) We can restrict the search space for TPP triples with the condition min ( supp(6^)) < 
min ( supp(6ijn)) < min ( supp(6^)) . 

Proof (1) If M is a set with 1g G M it follows that M C Q{M). Using Equation ([2]), we 
get that XnY = {l} for all X ^ Y e {S, T, U}. Thus, supp(63f ) n supp(6^) = for all 
X ^ Y G {S, T,U}. This proves the statement. 
(2) Follows immediately from Lemma 11.81 and the fact that we are looking for TPP triples 
{S,T,U) with \S\ = \T\ = \U\. □ 

The idea of the "moving 1" is as follows: After a TPP test fails we get the next candidate by 
moving the rightmost 1 in one step to the right. If this is not possible, delete the rightmost 
1 in 6^ and move the new rightmost 1. Finally we add the missing 1 to a free spot (remember 
that the column sums of C are at most 1). 

If it is not possible (all I's are at the right of b^) to move a 1 in b^, we delete the whole 
line b^ and move a 1 in 6^. After this we rebuild a new line b^ line from scratch using the two 
observations above. We do the same with line b*g if no more moves in line b^ are possible. 

Example. Let G be group of order 9. We are looking for (3, 3, 3) TPP triples. The initial 
configuration of C € {0, l}3x8 would be 



C 



1 


1 


















1 


1 


















1 


1 







h* 



(1,1,0,0,0,0,0,0) 
(0,0,1,1,0,0,0,0) 
(0,0,0,0,1,1,0,0) 



which means, that S = {1g, ffi, 52}, T = {10,93,9^} and U = {1g, 55, ^e}- Now we check, if 
(S, T, U) satisfies the TPP. If so, we are finished. If not, we generate the next candidate by 
moving a 1 in C: 



C 



1 


1 


















1 


1 


















1 




>1 





Now U = {IgiS'SiS?} and we check the TPP again. The procedure of the "moving 1" continues 
if the TPP check fails: 



1 


1 


















1 


1 


















1 




1 




1 


1 


















1 


1 




















1 




>1 


1 


1 


















1 




1 














1 






>1 





1 


1 


















1 


1 


















1 






>1 




1 


1 


















1 


1 






















^1 


1 



1 


1 


















1 


1 




















->1 


1 






1 


1 


















1 




>1 














1 




1 







In contrast to the example above, the next subsection takes care of Q{S) and Q{T) in Eq. ([2]). 
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2.2. The "marking the quotient" routine. To take care of the quotient sets in Eq. ([2]) we 

mark the quotient of each row in C in tlie row itself. This ensures that rows below this row 
don't use elements of the quotient sets. 

Example. We use the same example as above. We start with bg = (1,1,0,0,0,0,0,0), which 
means that 



C = 



1 


1 















































We mark the quotient set Q{S) in line b*g with a "(7": 



C = 



1 


1 




q 











































So the first possible 6^ line is 



C = 



1 


1 




q 














1 




1 

























Note that X C Q{X) for all X G {S,T,U}. Thus, we only have to mark the elements in 
Q{X) \X =: Q{X). Before we can move a 1 in a row bX^ we have to delete all marks Q{X). 

We have to deal with the case, that we found a b^ with the "moving 1" principle, but 
Q{S) n Q{T) 7^ {1}: In this situation we have to undo all steps in the process of "marking all 
elements in Q(T)" and we have to find a new b^ by moving a 1. 

2.3. EflRcient Storage of the Basic Binary Representation Matrix. If we use the matrix 
C to store all necessary information we have to store 3n elements and we need exactly 3 tests 
to check if we can move a 1 to a position p: we have to check if {bg)p = (6^)p = {blj)p = 0. 

We can omit the unnecessary space of 2n elements and the unnecessary 2 tests by projecting 
C^''" to a vector marked £ {-2,-1,0,1,2,3}": 

C ^ i.5J + (_i).6^(^^ + 2.6^ + (-2).6^(^) + 3-5^ 
Example. Consider the basic binary representation matrix 



1 


1 




q 




q 




















1 




1 




q 




q 


q 






















1 






1 







The corresponding marked vector is 

marked = ( 1, 1, 2, -1, 2, -1, -2, 3, -2, -2, 3, 0, ) 
The check {b*g)p = {b^)p = (6^)p = can now be done with marked[p] = 0. 

2.4. The Search Algorithm. The listing "SearchTPPTripleOfGivenType(G, m)" shows the 
pseudo-code for the main function of the search algorithm. The interested reader can get a more 
detailed version of this pseudo-code, all other pseudo-codes and an implementation in GAP 
online [8j or via e-mail from the second author. 

To test if a given candidate satisfies the TPP, we can use the test algorithms from Hedtke 
and Murthy [9]. It would also be possible to use a specialized TPP test, because Q{S) and Q{T) 
are already known and they satisfy Eq. ([2]). 
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SearchTPPTripleOfGivenType(G, m) 

for i = 1, . . . ,m - 1 do // start with S = {1g,5i,52, • ■ • ,ffm-i} 

I marked[i] ;= 1; 
repeat 

mark quotient set Q{S) of row 6^; 

if it is possible to generate a feasible row &^ from scratch then 
repeat 

if it is possible to mark the quotient set Q{T) of b'^ without a conflict with Q{S) then 
if it is possible to generate a feasible row from scratch then 
repeat 

if {S, T, U) is a TPP triple then // use a test from [S] 

I return (S',T,f/); 
until it is not possible to use the "moving 1" principle for 6^ anymore; 
unmark the quotient set Q{T) of 6^; 
until it is not possible to use the "moving 1" principle for b'^ anymore; 
unmark the quotient set Q{S) of bg; 
until it is not possible to use the "moving 1" principle for b*g anymore; 



3. An Application for 5x5 Matrix Multiplication 

In this section, we describe an application of the new algorithm. We will show that if a finite 
group G admits a (5, 5, 5) triple, then R{G) > 100. That is, we cannot improve the current 
best bound for R{5) using this particular TPP approach - of course there may be other group- 
theoretic methods that do yield better bounds. Even with the new algorithm, it is not feasible 
simply to test all groups of order less than 100 for (5, 5, 5) triples. Therefore we must use 
theoretical methods to reduce the list of candidates that must be checked. We will also produce 
a list of groups that could in theory contain a (5, 5, 5) triple for which R{G) < 125 (as defined 
below). 

For a finite group G, let T(G) be the number of irreducible complex characters of G and b{G) 
the largest degree of an irreducible character of G. 
We start with two known results. 

Theorem 3.1. [131 Theorem 6 and Remark 2] Let G be a group. 

(1) Ifb{G) = 1, then R{G) = \G\. 

(2) // biG) = 2, then R{G) = 2\G\ - T{G). 

(3) // biG) > 3, then R{G) > 2\G\ + 6(G) - T{G) - 1. 

We write R{G) := J2i R{di) for the best known upper bound (follows from Wedderburn's 
structure theorem) and R{G) for the best known lower bound (the theorem above) for R{G). 

Definition 3.2 (CI and C2 candidates). A group G that realizes (5,5,5) and satisfies R{G) < 
100 will be called CI candidate. A group G that realizes (5, 5, 5) and satisfies R{G) < 125 will 
be called C2 candidate. 

The following is well known, but we include a short proof for ease of reference. 

Lemma 3.3. If G is non-abelian, then T{G) < Equality implies that \G : Z{G) \ = 4. 

Proof. If the quotient G/Z{G) is cyclic, then G is abelian. Therefore if G is non-abelian, then 
\G : Z{G)\ > 4. Hence \Z{G)\ < l\G\. Now T{G) is known to equal the number of conjugacy 
classes of G. For any x G G, either x is central or Ix*^! > 2. The number of conjugacy classes 
of length at least 2 is T(G) - \Z{G)\. Therefore |Gj > |Z(G)| + 2(r(G) - \Z{G)\). This implies 
T{G) < i(|G| + \Z{G)\) < ||G|. Equality is only possible when \Z{G)\ = i|G|. □ 
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Obviously, it is necessary to keep the list of all CI and C2 candidates as short as possible. 
To achieve this goal we will develop some common properties of CI and C2 candidates in this 
section. We will use them to eliminate as many candidates as possible from the list. 

It will be helpful to establish some notation in the particular case where a group has a TPP 
triple and a subgroup of index 2. 

Definition 3.4. Let G be a group with a TPP triple {S,T,U), and suppose H is a subgroup 
of index 2 in G. We define Sq = SnH,To = TnH,Uo = UnH, Si = S\H,Ti=T\H and 
Ui = U\H. 

Lemma 3.5. Suppose G realizes (5,5,5). If G has a subgroup H of index 2, then H realizes 
(3,3,3). 

Proof. Suppose G realizes (5,5,5) via the TPP triple {S,T,U). If \So\ < \Si\, then for any 
a G Si, replace S by Sa~^. This will have the effect of interchanging Sq and Si. Hence we may 

assume that l^ol > IS*!!, ITqI > |Ti| and |;7o| > \Ui\. Now {So,To,Uo) is a TPP triple of H, and 
since each of Sq, Tq and Uq has at least 3 elements, clearly H realizes (3,3,3). □ 

Lemma 3.6. Suppose G has a TPP triple {S,T,U). Let H be an abelian subgroup of index 2 
in G. Then the following hold. 

a) \So^ToUo\ = |5o||ro||i7o|; 

b) \SY^TiUo\ > \Si\\Ti\; 

c) \S^'^Ui\ = \Si\\Ui\; 

d) So'ToUonS^'TiUo = ^; 

e) So^ToUonS^'UiTo = il); 

f) S^^TiUonS^^UiTo = $. 

Proof. The proof relies almost entirely on the definition of a TPP triple {S, T, U) ; that if s G 

Q{S),t G Q{T) and u G Q{U) with stu = 1, then s = t = u = l. 

a) The map {s,t,u) i— > s^^tu from Sq x Tq x Uq to S^^TqUq is clearly surjcctivc. It is also 
injective: suppose s~^tu = s~^tu for some s,s G Sq, t,t G Tq and u,u ^ Uq. Then, 
remembering that H is abelian, we may rearrange to get {ss~^){tt^^){uu~^) = 1, forcing 
(by definition of TPP triple), s = s, t = t, u = u. Therefore the map is bijective and 
I'S'o ^TqUqI = |S'o||To||[/o|- 

b) The map (si,ii) ^ s^^til from x Ti to S^^TiUq is injective as s^^til = s^^iil, for 
some si,si G and ti,ii G Ti, implies {sis^^){tii^^){ll~^) = 1, which implies Si = si 
and ti = ii. Thus |5f ^TiC/qI > \Si\\Ti\. 

c) The map {si,ui) i-t- s^^ui from Si x Ui to S^^Ui is clearly surjective; it is injective 
as Si^ui = s^^iii implies (sis^^)(11~^)(mi'U^^) = 1 and hence si = si and ui = ui. 
Therefore \S^^Ui\ = \Si\\Ui\. 

d) A nonempty intersection Sq^TqUq fl S^^TiUq ^ implies there exist sq G Sq, to G Tq, 
uo,uo G Uq, Si G Si andti G Ti such that s^^tQUQ = s^^tiUQ. But then t]^"'^siSQ^to'"o'"o"^ = 
1. Now t^^si, Sq and to Sirc all elements of the abelian group H. Therefore we can 
rearrange to get {tQt^^){siSQ^){uoUQ^) = 1. Since {T,S,U) is a TPP triple, this im- 
plies So = si, contradicting the fact that sq and si lie in different i?-cosets. Therefore 

s^^TqUo n s^^TiUq = 0. 

e) Suppose for some sq G Sq, to, to € ^O) £ ^^O; si G 5i and ui G Ui we have SQ^toUQ = 
s^^uito- Then {sos];^){uiu^^){iotQ^) = 1, which implies (by the TPP for {S,U,T)) that 
So = si, a contradiction. Therefore Sq^TqUq H S^^UiTq = $. 

f) Suppose for some si,si G Si, to G To, ti G Ti, uq G Uq and ui G Ui, we have 
Si^tiUQ = s^^uitQ. Then {siSi^){titQ^){uQU^^) = 1, which implies uq = ui, a con- 
tradiction. Therefore S^^TiUq D S^^UiTq = 0. □ 
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Theorem 3.7. If G realizes (5,5,5) and \G\ < 72, then G has no abelian subgroups of index 2. 

Proof. Suppose G has an abehan subgroup H of index 2 and reahzes (5, 5, 5) via the TPP triple 
{S,T,U). Define 5*0, Tq, Uq, Si, Ti and Ui as in Definition 13. 4[ Then, as in the proof of 
Lemma [3.5^ we may assume \So\ > 3, ITqI > 3 and \Uo\ > 3. Without loss of generality we 
may assume that |5o| > ITqI and |5o| > \Uo\. Now since |G| < 72, we have \H\ < 36. So, from 
Lemma 13.61 we have 

36 > \H\ > \Sq^ToUo U S^^UiTo U 5f ^TiC/o| 

= \So\\To\\Uo\ + \S^'UiTo\ + \S^^TiUo\ (3) 
> |5o||ro||C/o| + |5i||[/i| + |5i||Ti|. (4) 

Using Equation ^ if either Tq > 4 or Uq > 4, we have 5*0 > 4, which forces \H\ > 48, a 
contradiction. Thus |To| = \Uq\ =3. li Sq > 4: then we get \H\ > 40, another contradiction. 
Therefore |5o| = |To| = \Uo\ = 3, which gives that > 27 + 4 + 4 = 35, and so \H\ G {35, 36}. 
If two of Q{Sq), Q{Tq) and Q{Uo) were groups of order 4, then they would generate a subgroup 
of order 16 in H, which is impossible. Therefore, permuting S, T and U if necessary, we may 
assume that Q{Tq) and Q{Uq) are not subgroups of order 4. 

Now consider 5f^C/iTo. Write X = S^^Ui. Then |X| = 4. If \XTq\ = 4, then XTq = X, 
and thus X{Tq) = X, which implies that X is a union of (To)-cosets. In particular, 4 = \X\ 
divides the order of (Tq). But Tq alone contains 3 elements. Hence (Tq) has order 4. A quick 
check shows that Q{Tq) = {Tq), contradicting the fact that Q{Tq) is not a subgroup of order 4. 
We have therefore shown that \S^^UiTq\ > 4. A similar argument with S^^TiUq and Q{Uq) 
shows that \S^^TiUq\ > 4. Substituting back into Equation ^ gives |F| > 27 + 5 + 5 = 37, 
a contradiction. Therefore no group of order at most 72 can have both a (5,5,5) triple and an 
abelian subgroup of index 2. □ 

We are grateful to Peter M. Neumann for pointing out an argument which considerably 
shortened our proof for the case \H\ = 36 in the above result. 

3.1. CI Candidates. 

Proposition 3.8. If G is a CI candidate, then G is non-ahelian and 45 < \G\ < 72. 

Proof. If G is abelian then R{G) = \G\. The maximal size of a TPP triple that G can realize 
is Therefore G cannot be a CI candidate. Assume then that G is non-abelian. The fact 
that |G| > 45 follows immediately from Lemma 11.111 For the upper bounds, the fact that 
T{G) < implies 2\G\ - T{G) > ^\G\ and hence, by Theorem EH R{G) > ^\G\. So if 
|G| > 72, then R{G) > = 99. Hence G is not a CI candidate. Therefore, if G is a CI 

candidate, then 45 < |G| < 72. □ 

Theorem 3.9. No group of order 64 is a CI candidate. 

Proof. A GAP calculation of Pospelov's lower bound on R{G), followed by elimination of any 
group with an abelian subgroup of index 2, leaves a possible list of seven groups of order 64 
that could be CI candidates. If any of these groups G were to realize a (5, 5, 5) triple, then any 
subgroup of order 32 in G would realize a (3, 3, 3) triple. But a brute-force computer search, 
similar to that performed by two of the current authors in [9] , shows that each of these groups 
of order 64 has at least one subgroup of order 32 which does not realize (3,3,3). Therefore, no 
group of order 64 is a CI candidate. □ 

Theorem 3.10. Tahle\^ contains all possible CI candidates. 

Proof. By Proposition 13.81 we need only look at groups of order between 45 and 72. A simple 
GAP program can calculate Pospelov's lower bound on R{G). Any group for which this bound is 
greater than 99 can be eliminated. Next, we can eliminate any group with an abelian subgroup 
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GAP ID 


structure 


character degree pattern 


^(G) 


R{G) 




(_/4 XI O3 


U 1-3 ) 


on 
yu 


118 
lio 


fAS 981 
[4to,ZoJ 




(i2 o3 q2 4l\ 


Q1 


118 


fA8 9Ql 




(-,2 o3 o2 4l\ 


Q1 


1 1 8 


1 4:0,0U 1 




(■i4 o2 o4n 


88 

00 


1 1 n 


[48,31] 


(7/1 X 4/1 

V_.. ^ /\ ^ J. /I 


(1^2 34) 


82 


104 


[48,32] 


C2 X SL(2, 3) 


(16,26,32) 


84 


94 


[48,33] 


SL(2, 3) X C2 


(16,26,32) 


84 


94 


[48,48] 


C2 X ^4 


(14,22,3^) 


88 


110 


[48,49] 


C| X 


(112,34) 


82 


104 


[48,50] 


C| X C3 


(1^3^) 


90 


118 


[54,10] 


C2 X (C| X Ca) 


(118,3^) 


88 


110 


[54,11] 


C2 X (Cg X Cs) 


(118,3^) 


88 


110 



Table 2. All possible CI candidates. 



of index 2 by Theorem 13.71 and any group of order 64 by Theorem 13. 9[ This reduces the list 
to 20 groups. Finally, we observe that if any group of order 48 is a candidate, then any of its 
subgroups of order 24 must realize a (3, 3, 3) triple. Another brute-force search on groups of 
order 24 eliminates ten groups of order 48 from the list. The final list contains ten groups of 
order 48 and two of order 54. □ 

3.2. C2 Candidates. 

Proposition 3.11. If G is a C2 candidate, then G is non-abelian and 45 < \G\ < 90. 

Proof. We use the same arguments as in the proof of Proposition 13.81 If |G| > 91, then R{G) > 
iig^i > 125. Hence G is not a C2 candidate. Therefore if G is a C2 candidate, then 45 < |G| < 
90. □ 



Theorem 3.12. Tahle\^ contains all possible C2 candidates that are not CI candidates. 

Proof. By Proposition 13.11] we can restrict our attention to groups of order between 45 and 90. 
We can use Pospelov's bound for R{G) and (for groups of order at most 72) the existence of 
abelian subgroups of index 2 to eliminate many candidates. After these observations, we look 
to see if any of the remaining candidates have subgroups of index 2 that do not realize (3, 3, 3). 
If so, then by Lemma [33] the group cannot be a C2 candidate. After this process, 37 groups 
remain as candidates. Twelve are the existing CI candidates we already know about. So there 
are 25 'new' groups here. □ 

We note that one of the C2 candidates, ^5, is already known i \\.2\ Section 3]) to have a 
(5,5,5) triple so we would not need to check it again computationally. 

4. Computations, Tests and Results 

4.1. Runtime. We tested our new search algorithm against a specialized version (that only 
looks for (m, m, m) triples) of the currently best known search algorithm with the test routine 
TPPTestMurthy (see [9]). Note that we only consider groups that do not realize (3,3,3) to 
show the worst-case runtimes of the searches. Table U] lists the runtime^ of the search for 
(3, 3, 3) TPP triples in non-abelian groups of order up to 26 that satisfy Neumann's inequality 
3(3 -|- 3 — 1) < Our algorithm achieves a speed-up of 40 in the worst-case and 194 in the 

^The test were made with GAP 4.6.3 64-bit (compiled with GCC 4.2.1 on OS X 10.8.3 using the included 
Makefile) on a Intel® Core™ i7-2820QM CPU @ 2.30GHz machine with 8 GB DDR3 RAM @ 1333MHz. 
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GAP ID 


structure 


character degree pattern 


R{G) 


R{G) 


[OZ,OJ 




43^ 


1 nn 








1 n'^ 


±00 


^4 




IJ- )^ )D J 


1 n'^ 


±00 


^4 S 


Lv3 J ^ 02 


/1 2 o4 o4\ 
{I , Z ,0 J 


1 nn 


1 99 


[00, ij 


r'n VI r^ 

L/ll XI 05 




1 n7 


9n^ 

^uo 


[OD,ilJ 


r^ VI r*^ 


71^ 


1 1 n 


^uo 


[or,ij 


rm VI 

(_yig XI (_/3 


n3 •^6^ 


1 n7 


1 41 

14:1 


[UU,dJ 


4c 
^5 


{1 ,0 j'i ,0 j 


1 1 Q 

± ± t? 


1 Qfi 


fin fi 


ro V iT'c VI Ca) 




1 OR 

LKJO 


1 "^Q 


fin 7 

[DU, l\ 


VI r. 


(Ti 92 43\ 


1 1 4 


1 

lUO 


ffin si 


'-'3 ^ -'-^10 




1 1 1 
111 


1 AA 


fin Ql 


rr V 4. 


(lib q5\ 


1 09 


1 '^n 

lOU 


fi'^ 1 


c-, VI r'n 

X Lyg 




1 1 

110 


1 47 

1*4: / 


fi'^ 


ro V I'r'^ VI r'o'i 




1 1 

110 


1 47 

14: / 


79 1 fil 


rn V i'r'2 VI rn^i 

L^2 X l^'-^2 9 y 




1 99 




79 471 


r^ V 4. 


nlS o6\ 


1 99 




78 "^1 


rio V 'i'o 

'-^13 ^ ^3 


/I 26 ol3\ 
U > ^ J 


1 1 7 
1 1 ( 


1 1 7 
1 1 / 


8n 211 




Q40 


110 


110 


[80,22] 


C5 X (C4 X C4) 


(140,210) 


110 


no 


[80,24] 


C5 X (Cg X C2) 


(1^0,210) 


110 


110 


[80,46] 


Cio X 


(1^0,210) 


110 


110 


[80,47] 


Cio X Qs 


(1^0,210) 


110 


110 


[80,48] 


C5 X ((C4 X C2) X C2) 


(1^0,210) 


110 


110 


[88,9] 


Cii X Ds 


(1^4, 2II) 


121 


121 


[88,10] 


Cn X Qs 


(1^4, 2II) 


121 


121 



Table 3. All possible C2 candidates that are not CI candidates. 



best-case in comparison to the specialized version of Hedtke and Murthy [9j. We are able to 
shrink the number of candidates that we have to test for the TPP by a factor of 14 in the 
worst-case and 59 in the best-case. We remark that there are cases where the old algorithm 
tests 450,450 candidates and the new algorithm requires no TPP tests at all. 

We only did tests in the (3, 3, 3) case, because the old algorithm is too slow to do a comparison 
like Table m for the (4,4,4) case (or higher). The search need only be run in groups that satisfy 
Neumann's inequality: a group G can only realize {m,m,m) if it satisfies m{2m — 1) < |G|. 

We remark that the speed-up becomes slower when the group becomes larger. However this is 
not of particular concern in the context of our problem: the old search algorithm works on 5*, T 
and U and the new algorithm works on Q{S), Q{T) and U. So in the best-case the old algorithm 
uses |5| + |T| + |C/| = 3m elements and the new algorithm uses |Q(5)| + |Q(T)| + |C/| < m^+m^+m 
elements to filter TPP triple candidates. The speed-up will be problematically small when 
m? <^ \G\, but you will only look for groups that are near Neumann's lower bound to get a good 
matrix multiplication algorithm. 

It is not easy to get results about the asymptotic runtime, because that highly depends on 
the structure of the groups. But as a worst-case result we get 

^ worst-case runtime 

bound for the number of for a TPP test with 
triples that satisfy Eq. Q Thm. fTTT2l 



o 



\G\ \ m log m 
m!3(|G| -3m)! 
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GAP 




average* runtime in ms 


speed- 


number of TPP tests 


search space re- 


ID 


structure 


new algo. 


old algo. 


up 


new algo. 


old algo. 


duction factor 


[16,3] 


id X C2) X C2 


192 


20,133 


104 


11,595 


450,450 


38 


[16,4] 


C4 X C4 


1 /I A 

14U 


1 A A OA 
19,481 


139 





/I C A A tzr\ 

450,450 


00 


[16,6] 


Cs X C2 


lit; 

lib 


1 A 1 

19,031 


169 





/I C A /I C A 

450,450 


00 


[16,7] 


Die 


241 


20,416 


84 


14,336 


450,450 


31 


16 8 




162 


20,060 


123 


9,005 


450,450 


50 


[16,9] 


Q16 


99 


19,250 


194 





450,450 


00 


[16,11] 


C2 X 


311 


20,079 


64 


19,314 


450,450 


23 


[16,12] 


C2 X Qs 


135 


18,667 


138 


7,628 


450,450 


59 


[16,13] 


id X C2) X C2 


201 


19 538 


97 


12,107 


450 450 


37 


[18,1] 


Dis 


658 


51,899 


78 


39,499 


1,113,840 


28 


[18,3] 


C3 X 5*3 


341 


50,360 


147 


20,134 


1,113,840 


55 


[18,4] 


C| X C'2 


646 


51,131 


79 


39,999 


1,113,840 


27 


[20,1] 


C5 X C4 


1,028 


119,588 


116 


54,233 


2,441,880 


45 


[20,3] 


C5 X C4 


1,388 


121,702 


87 


73,971 


2,441,880 


33 


[20,4] 


£'20 


2,033 


118,599 


58 


114,979 


2,441,880 


21 


[22,1] 


£'22 


A C OA 

4,539 


/1 1 r^o A 

241,524 


53 


/I A C A 

240,950 


4,000, /bO 


19 


[24,1] 


C3 xCs 


5,610 


501,854 


89 


292,340 


9,085,230 


31 


[24,4] 


C3 X Qs 


D,UOO 


AQQ r^yi 
4yo,0 i 1 


82 


ofio 1 fin 


y,uoo,zou 


29 


[24,5] 


O4 X 63 


7,912 


483,640 


dA 

Dl 


419,556 


9,085,230 


/i 


[24,6] 


i?24 


10,711 


479,688 


44 


568,672 


9,085,230 


15 


[24,7] 


C2 X (C3 X C4) 


6,623 


486,323 


73 


339,829 


9,085,230 


26 


[24,8] 


{Ce X C2) X C2 


8,804 


479,182 


54 


463,453 


9,085,230 


19 


[24,10] 


C3 X £)8 


6,540 


481,217 


73 


359,830 


9,085,230 


25 


[24,11] 


C3 xQs 


5,250 


490,716 


93 


284,001 


9,085,230 


31 


[24,14] 


C2 X C2 X S3 


11,555 


475,916 


41 


622,455 


9,085,230 


14 


[26,1] 


D26 


20,658 


832,722 


40 


1,024,317 


15,939,000 


15 



* The average is taken over 10 runs in which the highest and lowest runtimes are omitted. 
** A factor X means that (# TPP test of the new algo.) < y (# TPP test of the old algo.). 

Table 4. Comparison of the average runtime and number of TPP tests in the 
search of (3, 3, 3) TPP triples for the old and the new search algorithm. 



as a bound for the runtime of our new algorithm. This is exactly the same bound as for the 
algorithm in [9]. But as the results in Tabled] show, the real runtime of our new algorithm highly 
depends on m and the structure of the group, whereas the real runtime of the old algorithms 
seems only to depend on m and the size of the group. 

4.2. Managing the (Parallel) Computation on a (Super-) Computer. To compute the 
results (next section) for the search of (5, 5, 5) TPP triple we used a supercomputer (a cluster 
with Sun Grid Engine) at the Martin-Luther-University Halle- Wittenberg. The computations 
(and their management) took several months. The number of 6^'s can be computed with 

# of bg's = \{{xi,X2,X3,X4) € : 1 < xi < a;2 < 2:3 < X4 < |G|}| 
|G|-3 \G\-2 \G\-l \G\ 

= E E E E l = ^(|G|^-6|G[3 + ll|G|2-6|G|). 

xi=l 3;2=a;i+l a;3=a;2-|-l a;4=a;3-|-l 

The number of 6^'s for all groups in the Tables [2] and [3] can be found in Table [5j We imple- 
mented the search algorithm with the optional arguments startrow and number OfRowOneTests 
to realize a rudimentary parallelization: With an easy script we construct the set of all possible 
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\G\ 


48 


52 


54 


55 


56 


57 


# of b*s's 


178,365 


249,900 


292,825 


316,251 


341,055 


367,290 


\G\ 


60 


63 


72 


78 


80 


88 


# of b*s's 


455,126 


557,845 


971,635 


1,353,275 


1,502,501 


2,225,895 



Table 5. Number of for all groups in the Tables [2] and [3j 



bg^s and divide it into subsets of size 1,000 resp. 10,000. Now we start (# of 6^'s)/l,000 resp. 
(# of 6^'s)/10,000 independent jobs on a cluster, each with a different startrow that has to check 
numberOfRowOneTests = 1,000 resp. numberOfRowOneTests = 10,000 of the 65's. It is clear 
that even with an optimized search algorithm this is an immense amount of work. It follows 
right from that fact, that we dealt with tricks like going from the matrix representation C to 
the vector representation marked to get a sufficient speed-up to solve the (5, 5, 5) problem. 

4.3. Results. Our search for (5, 5, 5) TPP triples in all groups of the CI list showed, that no 
group can realize 5x5 matrix multiplication with less than 100 scalar multiplications with the 
group-theoretic framework by Cohn and Umans. This continues the results [9, Theorem 7.3] of 
two of the authors who showed the same statement for 3x3 and 4x4 matrix multiplication. 

5. How TO Construct a Matrix Multiplication Algorithm from a TPP Triple? 

As the results show, we were not able to find a group G that realizes (5, 5, 5) with R{G) < 100. 
But the groups in the C2 list could realize (5, 5, 5) with less than 125 scalar multiplication, 
because R{G) < 124. This section shows a strategy to search for a nontrivial 5x5 matrix 
multiplication algorithm in the C2 list. 

Consider the case, that we found a (5, 5, 5) TPP triple {S, T, U) in a group G of the C2 
list. We only know that R{G) < 125, so we don't know if this leads to a nontrivial matrix 
multiplication algorithm. It could require 125 scalar multiplications or more. To construct the 
algorithm induced by the given TPP triple we have to construct the embeddings A ca and 
-B I— >■ es of the matrices A = [ag^t] and B = [6t,«] in C[G]: 

Qs^t ^ o.s,tS~^t, bt^u ^ bt^ut^^u for all s G 5, t G T, n G [/. (5) 
The next step is to apply Wedderburn's structure theorem: 

C[G] ^ C'^ix'^i X C'^2X'^2 X • • • X C'^iX'ie^ (6) 

where di, . . . ,di are the character degrees of G. The given matrices A and B are now represented 
by ^-tuples of matrices ca . . . , Ai) and es 1-^ (-^i, • • • , Bi). The last step is easy: just 

use the best known algorithms to compute the products AiBi or try to make use of the structures 
(e.g., symmetries, zero entries, . . .) in Ai and Bi to find even better algorithms for the small 
products AiBi. The back transformation works as in Equation ([5]) but in the other direction. 

Note that it could be possible to use the structure of the zero entries in Ai and Bi: There is 
space for entries in each small matrix. Over all small matrices together we have enough space 
ioi d\ + ■ ■ ■ + df = \G\ elements. But we only need space for \S\ ■ \T\ resp. |T| • \U\ elements. 

The key questions for future research are: 

(Ql) Are there different embeddings ([6]), in the sense that they lead to different structures 

(pattern of zeros or other types) in the small matrices? 
(Q2) Does the number M(e) of multiplications needed to compute the product in ([6]) depend 

on the embedding e? 

(Q3) If so, we can bound R{G) by mineM(e). How many embeddings e are there and how 
easy is it to compute miug M(e)? 
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Example. Consider the alternating group on five elements. The character degree pattern is 
(l\32,4\5i) and so 

€[^5] ^ C X C^^^ X C^^^ X C^^^ X C^^^ 

We know that realizes (5, 5, 5). There is place for 60 elements in the embedding ba G ^[^5] 
of a 5 X 5 matrix A with 25 elements. The same for cb- So we have to embed at most 
^g-irp u T-iu\ < + \T-^U\ - 1 < 25 + 25 - 1 = 49 elements into a space of l^s] = 60 

elements. Assume that we can fill the lower dimensional parts of the right hand side of ([6]) 
completely. Thus, only 49 — 1^ — 3^ — 3^ — 4^ = 14 elements of the small matrices in C^^^ are non- 
zero. Therefore it could be possible, that A^ induces a nontrivial matrix multiplication algorithm: 
For the first "complete" parts we need i?(l) + 2i?(3) + i?(4) = 96 scalar multiplications. We have 
28 scalar multiplications left to compute the product of A5S5 to beat 125 scalar multiplications. 

Example. The symmetric group G := S3 on three objects realizes (2,2,2) via the TPP triple 
S = {si = 1g,S2 = (1,2)}, T = {h = lG,t2 = (1,3)}, U = {ui = 1g,U2 = (2,3)}. We identify 
Aij with Ag.^tj and Bji^. with Btj^Uk- The transformation into C[G] results in 

ci := aula + 012(1,3) + 021(1,2) + 022(1,3,2), 

C2 := bnlG + 612(2, 3) + 621(1, 3) + 622(1, 3, 2). 

The character degree pattern of 5*3 is (1^,2"*^), so C[G] = C x C x C^^^. To construct the map 
/: C[G] ^ C X C X C^^^, we follow [1, Example 13.37]. The irreducible representations of 
are 

(1) The trivial representation r: 5*3 — > C, g 1— t- 1. 

(2) The alternating representation a: 53 — )• C, (7 i-> sgn{g). 

(3) The representation p: S3 ^ C^x^ (2,3) [0 ^i]' (1>2,3) ^ [? l}]- 
Thus, we conclude 

/(E,,53^^^) = {T.,,ss^^<9), J2^^,^>^M9), J2,,s,^^p^'^)- 

It follows that 

Oil— 022— 012 021+022 
021— 022— 012 011+012 

611+612-621-622 622-612 
-621-622 611-612+621 

In C we compute the product with 1 multiplication. In C^^^ we can use Strassen's algorithm 
with 7 multiplications. Therefore, we need 9 multiplications to calculate /(ci)/(c2). 

This method provides a way to construct the multiplication algorithm induced by a given TPP 
triple. If it works (that means if one can answers questions (Ql), (Q2) and (Q3)), we find new best 
or at least nontrivial matrix multiplication algorithms for matrices of small dimension. Another 
approach to multiply matrices with a given TPP triple can be found in Gonzalez-Sanchez et al. 
[7]. But as far as we know, this approach doesn't construct the matrix multiplication algorithm 
itself. 

6. Conclusions 

From our point of view there are five open key questions or ideas one could use for future work. 

The first two are obviously the (5, 5, 5) search in the C2 list, together with a practicable 
method to construct a matrix multiplication algorithm out of a given TPP triple. And Cl-like 
searches for (6, 6, 6) matrix multiplication algorithms and higher. 

Is it easy and efficient to implement a search algorithm that does use products of quotients 
sets like in Theorem 11.121 .'' 

Is there a constructive algorithm for TPP triples of a given type {n,p,m)? 



/(ci) = 1^011+012+021+022, 011+022-012-021, 
/(C2) = (611+612+621+622, 611+622-612-621, 
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As far as we know, the smallest example for a non-trivial matrix multiplication realized by the 
group-theoretic framework by Cohn and Umans is (40,40,40). The group G = I C2 realizes 
(2n(n - l),2n(n - l),2n(n - 1)) with the rank R{G) = 2\G\ - T{G) = An^ - \{n^ + Sn^) = 
^n'^(7n^— 3), see [21 Section 2] for details. Thus, for n = 5 it realizes 40x40 matrix multiplication 
with 54,500 scalar multiplications. This is way better than the naive matrix multiplication 
algorithm with 40'^ = 64,000 scalar multiplications. On the other hand this is not a good result 
at all: Using R{AQ) = R{2^ ■ 5) < R{2fR{b) < 7^ • 100 = 34,300 we get an even better algorithm. 
The best known upper bound for the number of scalar multiplications in this case is 

+ I2v? + lln 40^ + 12 • 40^ -Ml • 40 

= = 27,880 

3 3 

by O Proposition 2]. Maybe our new algorithm can help to find a minimal working example 

for a non-trivial matrix multiplication algorithm realized with the group-theoretic framework by 

Cohn and Umans. 
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